Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning
نویسندگان
چکیده
Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming respective sequence words. This paper presents a deep learning ASR system optimization and evaluation for European Portuguese language. We present pipeline composed several stages data acquisition, analysis, pre-processing, model creation, evaluation. A transfer approach proposed considering an English language-optimized starting point; target Portuguese; contribution to by source from different domain consisting multiple-variant language dataset, essentially Brazilian Portuguese. adaptation was investigated between mixed (mostly Brazilian) The used NVIDIA NeMo framework implementing QuartzNet15×5 architecture based on 1D time-channel separable convolutions. Following this data-centric approach, optimized, achieving state-of-the-art word error rate (WER) 0.0503.
منابع مشابه
Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملDIXI - A Generic Text-to-Speech System for European Portuguese
This paper describes a new generic text-to-speech synthesis system, developed in the scope of the Tecnovoz Project. Although it was primarily targeted at speech synthesis in European Portuguese, its modular architecture and flexible components allows its use for different languages. We also provide a survey on the development of the language resources needed by the TTS.
متن کاملLow-Resource Speech-to-Text Translation
Speech-to-text translation has many potential applications for low-resource languages, but the typical approach of cascading speech recognition with machine translation is often impossible, since the transcripts needed to train a speech recognizer are usually not available for low-resource languages. Recent work has found that neural encoder-decoder models can learn to directly translate foreig...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملDomain Adaptation For Formant Estimation Using Deep Learning
In this paper we present a domain adaptation technique for formant estimation using a deep network. We first train a deep learning network on a small read speech dataset. We then freeze the parameters of the trained network and use several different datasets to train an adaptation layer that makes the obtained network universal in the sense that it works well for a variety of speakers and speec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Future Internet
سال: 2023
ISSN: ['1999-5903']
DOI: https://doi.org/10.3390/fi15050159